perm filename A04[106,RWF] blob sn#773538 filedate 1984-10-19 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00007 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Dynamic Programming
C00010 00003	The algorithm can be summarized this way:
C00012 00004	Exercise - Dynamic Programming
C00016 00005	Dynamic Programming/Spelling Checker
C00023 00006	At a certain stage, we have partly filled the table, as shown below:
C00029 00007	Assume we are given a road map, like the one below, that shows the distance
C00034 ENDMK
C⊗;
Dynamic Programming

I have a large supply of blocks, of thicknesses 7, 11, and 16 inches.  I want
to know if I can make a stack exactly 100 inches high.  The problem seems very
complicated; perhaps I should try all ways of stacking 14 or fewer blocks.
I feel discouraged; there must be a million ways.  Then my accountant tells me
that because of the new Block Conservation law, I must use the smallest possible
number of blocks to make the stack.  An old heuristic rule says that if the problem
is too big, I should try a smaller version.  What heights can I cope with?  Height
0 is obvious.  So are heights 1 to 6.  Height 7 can only be done with one block,
and so on.  I see that if I write down these results in a table as I go, when I
get to height 100 I will be able to say, "A stack of height 100 is a stack of height
100-7, 100-11, or 100-16, topped off with one more block."

I can look up in table entries for 93, 89, and 84 the smallest number of blocks to
give one of these heights, add one, and that is the smallest number of blocks to
give a stack of height 100.

Along the way up to 100, all the intermediate heights are treated the same way.
The table I produce starts this way:


height	number 
	of blocks	20	--	40	4
  0	   0		21	3	41	4
  1	   --		22	2	42	6
  2	   --		23	2	43	3
  3	   --		24	--	44	4
  4	   --		25	3	45	4
  5	   --		26	--	46	4
  6	   --		27	2	47	5
  7	   1		28	4	48	3
  8	   --		29	3	49	4
  9	   --		30	3	50	4
 10	   --		31	--
 11	   1		32	2
 12	   --		33	3
 13	   --		34	3
 14	   2		35	5
 15	   --		36	4
 16	   1		37	4
 17	   --		38	3
 18	   2		39	3
 19	   --		


The program for this algorithm reads the desired height, the number of block sizes,
and the sizes.

READ(HEIGHT,NUM);

FOR I:= 1 TO NUM DO READ(BLOCK[I]);

FOR I:= 1 TO HEIGHT DO S[I]:= 10000;

S[0]:= 0;

FOR H:= 1 TO HEIGHT DO

	FOR I:= 1 TO NUM DO

		IF H>=B[I] THEN

			IF S[H-B[I]]<S[H] THEN

				S[H]:= S[H-B[I]]+1

This is the method of dynamic programming, for a single parameter.  Instead of
only finding the best solution for a single value of the parameter (here, 100),
I find and save it for all values up to the desired one.  As I go, I use the
saved knowledge to make each further step easy.  As a result, the computational
effort is only linearly proportional to the value of the final parameter.

Dynamic prograaming is not restricted to functions of one variable, as the next
application shows.

I play a simple dice game with an opponent: he rolls two dice, then I roll
two dice, and we alternate until one of us has accumulated a total of 30 or more. 
He offers to bet three to one that he will win.  I should accept if my winning
chance is more than 0.25, and decline if less.
(This problem is a simplification of problems arising in backgammon endgames).

I can get a tolerable approximation of my chances by playing the game by myself
a few hundred times, but if the actual chance is within a few percent of 0.25,
I would need to play perhaps a thousand times to avoid a significant chance of 
making the wrong decision.  I can program a computer to make the same experiment, 
but it is easier, faster, and more reliable to have it calculate the exact 
probabilities by another application of the dynamic programming method.  Let f(A,B) 
be the chance for the player X, whose turn is next to win when he needs a total
of A to win, and the second player, Y,  needs a total of B to win.  If A is zero 
or negative,  f(A,B) = 1; if B is zero or negative, f(A,B) = 0.  Otherwise, X
rolls two dice, getting numbers i and j between 1 and 6, and Y now gets a turn, 
so Y's chance is f(B,A-i-j).  The two players' chances total 1.00, so X's chance
after rolling the dice is 1-f(B,A-i-j).  X's chance before rolling the dice is
found by averaging over all the possible dice rolls; it is 

1 - 1/36 (Sigma)            f(B,A-i-j)
		1≤i≤6,1≤j≤6

I can calculate f(A,B) for each value of A and B up to 25, and save each in
a table as I go, for use in calculating f for larger values of A and B.  I
must be sure, though, that when I calculate f(A,B) I have already
calculated f(B,A-2) through f(B, A-12);  this can be guaranteed if I do the
calculation in the order

f(1,1)
f(2,1), f(1,2), f(2,2)
f(3,1), f(3,2), f(1,3), f(2,3), f(3,3)
f(4,1), f(4,2), f(4,3), f(1,4), f(2,4), f(3,4), f(4,4)
etc.

(Another satisfactory order would be in order of increasing sums of A and B:

f(1,1)
f(2,2),f(2,1)
f(1,3),f(2,2),f(3,1)
f(1,4))
The algorithm can be summarized this way:

	Initialize;

	FOR LARGER:= 1 TO 30 DO
		BEGIN
		FOR K:=1 TO LARGER-1 DO
			GETF(LARGER, K);
		FOR K:=1 TO LARGER DO
			GETF(K, LARGER)
		END;

	WRITE F[30,30]

where the procedure GETF(A,B) has this body:

	BEGIN
	SUM:=0,
	FOR I:=1 TO 6 DO
		FOR J:=1 TO 6 DO
			SUM:=SUM + F[B,A-I-J];
	F[A,B]:=1.00 -SUM/36.0
	END

Exercise:

Which values of the array F need to be initialized?  What  should the
subscript bounds on F be?

The theme of this example is that the easiest way to compute a function for
one assignment of argument values may be to compute a complete table of the
function. 
Exercise - Dynamic Programming

Some positions that arise in the game of backgammon become simple races,
where the first player to roll a large enough cumulative total on the dice
wins the game.  A player rolls two dice; if the numbers on the dice are
different, the player gets the sum, but if the numbers are equal, he gets
twice the sum; 1 & 2 gives the player 3, while 6 & 6 gives him 24.  

Calculate the average number of turns required for a player to accumulate a
total of 100.  Warning: don't fall into the trap of dividing 100 by the
average roll (8 1/6).

Harder Exercise - Dynamic Programming

Calculate the winning chance of the player whose turn is next in a
backgammon race (see previous problem), if the first player needs a total
of 38, and the second a total of 40.

Hardest Exercise - Dynamic Programming (Suitable for term project)

In backgammon as played for money, at any one time one (or initially both)
of the players has the right to double the stakes before rolling the dice,
if he thinks that on the average he will win the most money by doing so. 
The opponent may either give up the game, paying the current stake, or
agree to let the game continue for twice the current stake, in which case
the first player proceeds with his dice roll.  When a player uses the right
to double, he loses it and the other player gets it, so doubling alternates
between the players.

In the situation of the previous problem, if the first player has doubled
earlier in the game, so that the current stake is 2 units,  on the average
what will be the first player's net winnings, counting losses as negative
winnings?  Assume that each player offers and accepts doubles to maximize
average net winnings.  To make the problem harder, assume that neither
player has doubled; find the first player's average winnings, and determine
whether he should double before his first roll of the dice.

Dynamic Programming/Spelling Checker

Increasingly, people are using computers as an aid to good writing.  With a
good text-editing program, the writer can by a few keystrokes change a name
throughout a manuscript, can move paragraphs around without scissors and
paste, can indent and center and justify effortlessly.  One valuable
component of a good text editor is a spelling checker, which warns the 
writer of words not found in the dictionary.  If the spelling checker can
suggest a similar word as the correct spelling, such as ``fluorine'' for
the author's ``flourine'', or ``minuscule'' for ``miniscule,'' it may help
even more.  To find a similar word, a spelling checker must have a quantitative
measure of the difference between words.  A popular measure is to count the
smallest number of changes that will turn one word into the other.

The allowed changes are usually four:

     (1) Deletion of a letter: ``sherbert'' becomes ``sherbet.''
     (2) Insertion of a letter: ``curiculum'' becomes ``curriculum.''
     (3) Transposition of adjacent letters: ``niether'' becomes ``neither.''
     (4) Replacement of a letter: ``defence'' becomes ``defense.''

It is easy to design a program to test if two words differ by a single change,
but to find the similarities between Wednesday and Whensdi, Wooster and Worcester,
Chumly and Chaulmondeley, takes work.  The best method known iterates over all
the prefixes of the two words, starting with the shortest ones, and calculates
the minimum number of changes needed to change the one into the other; these
differences are stored in a table, for later use in dealing with longer prefixes.
The last ``prefixes'' looked at are the complete words themselves.

Let`s look at a concrete example, the words ``warranty'' and ``guarantee.''
Starting with the prefixes of length one, ``w'' becomes ``g'' by a single
letter replacement, so that distance is 1.  For ``w'' to become ``gu'' requires
more than one operation; if we suspect that the last operation is the addition
of the final ``u,'' ``w'' must first have been changed into ``g,'' we already
know that requires one operation, so we have found a way to change ``w'' to
``gu'' in two steps.  Each other way of getting the ``u'' has to be tried
before confirming that two steps is the least possible.


	G   U   A   R   A   N   T   E   E

    0   1   2   3   4   5   6   7   8   9
      \
W   1   1 - 2   3   4   5   6   7   8   9
	       \
A   2   2   2   2   3   4   5   6   7   8
		|
R   3   3   3   3   3   4   5   6   7   8
		  \
R   4   4   4   4   3   4   5   6   7   8
		      \
A   5   5   5   5   4   3   4   5   6   7
			  \
N   6   6   6   6   5   5   3   4   5   6
			      \ 
T   7   7   7   7   6   6   4   3 - 4   5
				      \
Y   8   8   8   8   7   7   5   4   4   5


where the marked path corresponds to the sequence

	WARRANTY
	GARRANTY
	GUARRANTY
	GUARANTY
	GUARANTEY
	GUARANTEE

Here is the algorithm in Pascal.  The two words are in character arrays
W1 and W2; their lengths are in W1L and W2L.

	FOR I:= 0 TO W1L DO
		TABLE[I,0]:= I;

	FOR J:= 1 TO W2L DO
		TABLE[0,J]:= J;

	FOR I:= 1 to W1L DO
		For J:=1 TO W2L DO
			BEGIN
			IF W1[I]=W2[J] THEN BEST:=TABLE[I-1,J-1] (*same letter*)
			ELSE BEST:=TABLE[I-1,J-1]+1; (*change one letter*)

			IF TABLE[I-1,J]<BEST
			THEN BEST:= TABLE[I-1,J]+1; (* delete one letter of W1*)

			IF TABLE[I,J-1]<BEST
			THEN BEST:=TABLE[I,J-1]+1; (* insert one letter of W2*)
			IF(I>1) AND (J>1) AND (W1[I-1]=W2[J] AND (W1[I]=W2[J-1])
			AND (TABLE [I-2,J-2]<BEST)
				THEN BEST:=TABLE[I-2,J-2]+1;
			TABLE[I,J]:=BEST
			END

	WRITE (W1,W2, TABLE[W1L,W2L])
At a certain stage, we have partly filled the table, as shown below:


	G   U   A   R   A   N   T   E   E
    0   1   2   3   4   5   6   7   8   9 
W   1   1   2   3   4   5   6   7   8   9
A   2   2   2   2   3   4
R   3
R   4
A   5
N   6
T   7
Y   8

Wanting the distance from ``wa'' to ``guaran,'' we try the possibilities.

   (1) The final ``a'' was deleted, after changing ``w'' to ``guaran'' in
       6 steps; total, 7.

   (2) The final ``n'' was inserted, after changing ``wa'' to ``guara'' in
       4 steps; total, 5.

   (3) Transposition does not change ``wa'' to ``an'', so we drop that approach.

   (4) Replacement of ``a'' by ``n,'' after changing ``w'' to ``guara'' in
       5 steps; total, 6.

Insertion yields the shortest solution at 5 steps, so we enter 5 into the next
vacancy in the table.  When the table is completed, it is:

*****************
(page missing)

The underlying method of this algorithm finds the best way to accomplish a
task by finding and recording the best way to accomplish a moderate number of 
partial tasks, at least one of which has to be done along the way, and then
trying all ways that the complete task can be a one-step continuation of one of
the partial tasks.  The method is called dynamic programming. 
************[references]

Exercise: Program the difference calculation taking into account estimates of
probability of each error.  Assume a replacement is much more likely for
letters adjacent on the keyboard.  Assume insertion and deletion are more
likely for a doubled letter, or for a vowel next to another vowel.  Assess
a cost for each change, roughly proportional to the logarithm of its estimated
likelihood.

Exercise:  This is a tough one.  Using an algorithm akin to the algorithm for
word differences, find the difference in sounds between the words. Your
algorithm would find no difference between "ghoti" and "fish", because

(1) GH in "enough" and F in "fish" can have the same sound.
(2) O in "women" and I in "fish" can have the same sound.
(3) TI in "action" and SH in "fish" can have the same sound.

To approach this problem:

(1) Using a good dictionary, list all the phonemes of English.
(2) Make up for computer input a list of all the letter sequences that can stand
    for each phoneme.  The phoneme at the end of "fish" for example, can be
    represented by SH and GH, but also by CH in "chivalry", S in "sure", and
    possibly C,H,J, or X, if you allow a few Portuguese ("Xica"), Polish,
    Spanish, or Serbo-Croatian words.
(3) Use the dynamic programming method for each prefix to find the smallest
    number of phonemic changes to make one into the other.

Exercise:  There are ten teams in a certain baseball league. Allowing for ties, 
how many different rankings are possible at the end of the season? 
(To check your assumptions: for a three-team league, there are thirteen 
possible rankings.)
Assume we are given a road map, like the one below, that shows the distance
between those pairs of cities that are directly connected by a single
stretch of highway.  How can we produce a table that shows the length of
the shortest route between each pair of cities?











A standard paradigm for approaching problems that involve a large number of
entities (in this case, cities) is to solve the problem first for small
subsets, then larger and larger ones.  If we had already solved the problem
for six cities, and stored the results in an array, what would we need to
do when a seventh city is introduced?

(1) The distance from the new city to itself is always zero.

(2) The best route from the new city (`A') to one of the earlier cities
(`B') is either a direct road, or consists of going from A to another of
the earlier cities (`C') and taking the best route from C to B, naturally
not passing through A; by trying every possible city in the role of C, we
can find the shortest route from A to B.

(3) If D and E are previously treated cities, the best route from city D 
to city E is either the previous best route, or goes from D to the new city A, 
and from A to E.  In the latter case, step 2 has already found the best such 
routes, and we can add their lengths.

By using the calculations implied by (1), (2), and (3) for each city in
turn, we build up the table until it reflects the entire map.  To present
the algorithm in Pascal, assume we have an array DIRECT, where D[I,J] gives
the length of the direct connection from city I to city J.  If there is no
direct connection, DIRECT[I,J]  is a very large number BIG, like 10000, 
which is sure to be larger than the length of the shortest route.

We want to fill in an array DISTANCE, so that DISTANCE[I,J] is the length
of the shortest route from city  I to city J.  The rules above are the
basis of a Pascal algorithm.

	FOR A:=1 TO N DO
		DISTANCE [A,A]:=0;
		FOR B:=1 TO A-1 DO
			BEGIN
			DIST :=DIRECT[A,B];
			FOR C:=1 TO A-1 DO
				IF DIST > DIRECT[A,C]+DISTANCE[C,B]
					THEN DIST:=DIRECT[A,C]+DISTANCE[C,B];
			DISTANCE[A,B]:=DIST;DISTANCE[B,A]:=DIST;
			END
		FOR D:=1 TO A-1 DO
			FOR E:=1 TO A-1 DO
				IF DISTANCE[D,A]+DISTANCE[A,E] < DISTANCE[D,E]
				THEN DISTANCE[D,E]:=DISTANCE[D,A]+DISTANCE[A,E]

Exercise:  Modify the above algorithm so that it also computes FIRST[I,J],
the number of the first city on the shortest route from I to J.

Exercise:  After performing the above modification, add a stage to the
algorithm that will read the numbers of two cities and print out the
numbers of all the cities along the shortest route between them.

Here is a simpler Pascal algorithm for the same problem; it is, however,
harder to explain.  See if you can discover why it works.

	FOR I:=1 TO N DO
		FOR K:=1 TO N DO
			DISTANCE[I,K]:=DIRECT[I,K];
	FOR I:=1 TO N DO
		FOR K:=1 TO N DO
			FOR J:=1 TO N DO
				IF DISTANCE[I,K]>DISTANCE[I,J]+DISTANCE[J,K]
					THEN DISTANCE[I,K]:=DISTANCE[I,J]+DISTANCE[J,K]